An Empirical Study on the Performance of Integrated Hybrid Prediction Model on the Medical Datasets
نویسندگان
چکیده
The medical data are multidimensional and hundreds of independent features in these high dimensional databases need to be considered and analyzed, for valuable decision-making information in medical prediction. Most data mining methods depend on a set of features that define the behavior of the learning algorithm and directly or indirectly influence the complexity of the resulting models. Hence, to improve the efficiency and accuracy of mining task on high dimensional data, the data must be preprocessed. Feature selection is a preprocessing step which aims to reduce the dimensionality of the data by selecting the most informative features that influence the diagnosis of the disease. We propose a feature selection embedded Hybrid Prediction model that combines two different functionalities of data mining; the clustering and the classification. The F-score feature selection method and k-means clustering selects the optimal feature subsets of the medical datasets that enhances the performance of the Support Vector Machine classifier. The performance of the SVM classifier is empirically evaluated on the reduced feature subset of Diabetes, Breast Cancer and Heart disease data sets. The proposed model is validated using four parameters namely the Accuracy of the classifier, Area Under ROC Curve, Sensitivity and Specificity. The results prove that the proposed feature selection embedded hybrid prediction model indeed improve the predictive power of the classifier and reduce false positive and false negative rates. The proposed method achieves a predictive accuracy of 98.9427% for diabetes dataset, 99% for cancer dataset and 100% for heart disease dataset, the highest predictive accuracy for these datasets, compared to other models reported in the literature. General Terms Data Mining, Dimensionality Reduction, Feature selection, Prediction Model
منابع مشابه
AN EXTENDED FUZZY ARTIFICIAL NEURAL NETWORKS MODEL FOR TIME SERIES FORECASTING
Improving time series forecastingaccuracy is an important yet often difficult task.Both theoretical and empirical findings haveindicated that integration of several models is an effectiveway to improve predictive performance, especiallywhen the models in combination are quite different. In this paper,a model of the hybrid artificial neural networks andfuzzy model is proposed for time series for...
متن کاملA Novel Intelligent Energy Management Strategy Based on Combination of Multi Methods for a Hybrid Electric Vehicle
Based on the problems caused by today conventional vehicles, much attention has been put on the fuel cell vehicles researches. However, using a fuel cell system is not adequate alone in transportation applications, because the load power profile includes transient that is not compatible with the fuel cell dynamic. To resolve this problem, hybridization of the fuel cell and energy storage device...
متن کاملPerformance evaluation of gang saw using hybrid ANFIS-DE and hybrid ANFIS-PSO algorithms
One of the most significant and effective criteria in the process of cutting dimensional rocks using the gang saw is the maximum energy consumption rate of the machine, and its accurate prediction and estimation can help designers and owners of this industry to achieve an optimal and economic process. In the present research work, it is attempted to study and provide models for predicting the m...
متن کاملWhich Methodology is Better for Combining Linear and Nonlinear Models for Time Series Forecasting?
Both theoretical and empirical findings have suggested that combining different models can be an effective way to improve the predictive performance of each individual model. It is especially occurred when the models in the ensemble are quite different. Hybrid techniques that decompose a time series into its linear and nonlinear components are one of the most important kinds of the hybrid model...
متن کاملAn Improved Hybrid Model with Automated Lag Selection to Forecast Stock Market
Objective: In general, financial time series such as stock indexes have nonlinear, mutable and noisy behavior. Structural and statistical models and machine learning-based models are often unable to accurately predict series with such a behavior. Accordingly, the aim of the present study is to present a new hybrid model using the advantages of the GMDH method and Non-dominated Sorting Genetic A...
متن کاملAn Integrated Model of Project Scheduling and Material Ordering: A Hybrid Simulated Annealing and Genetic Algorithm
This study aims to deal with a more realistic combined problem of project scheduling and material ordering. The goal is to minimize the total material holding and ordering costs by determining the starting time of activities along with material ordering schedules subject to some constraints. The problem is first mathematically modelled. Then a hybrid simulated annealing and genetic algorithm is...
متن کامل